General linear transformation

Linear Transformations and Matrices

Section 5.1: Linear Transformations

Core Idea: Generalizing Structure-Preserving Maps

Having established the general notion of a Vector Space in Chapter 4, Section 5.1 aims to define the most important type of function between vector spaces: those that respect the underlying vector space operations (addition and scalar multiplication). These are called Linear Transformations.

Rationale for the Definition (Definition 5.1.1):

  1. What properties should a "natural" or "structure-preserving" map T:VW have? Since vector spaces are defined by their addition and scalar multiplication, a natural map should interact nicely with these.
  2. Preserving Addition: If we add two vectors in V and then apply T, we should get the same result as if we first apply T to each vector individually and then add the results in W. That is, T(v1+v2)=T(v1)+T(v2).
  3. Preserving Scalar Multiplication: If we scale a vector in V by c and then apply T, we should get the same result as if we first apply T to the vector and then scale the result in W by c. That is, T(cv)=cT(v).
  4. The Definition: These two essential properties become the definition of a linear transformation. It's a function between vector spaces that "plays nice" with the operations that define those spaces.
  5. Broad Applicability: This definition is abstract enough to apply not just to functions between Rn and Rm, but also to functions involving spaces of functions, matrices, etc., as seen in the examples (differentiation, integration). This abstraction allows for unifying concepts across different mathematical domains.

Rationale for Key Theorems (Uniqueness, Existence, Representation):

  1. Linearity Means Determined by Basis Action (Theorem 5.1.5): If we know what a linear transformation T does to a set of vectors that span the domain V (in particular, a basis), then we know what it does to every vector in V.
    • Rationale: Any vV can be written as a linear combination of the spanning vectors, say v=c1u1++cnun. Because T preserves addition and scalar multiplication, we have T(v)=c1T(u1)++cnT(un). So, the value of T(v) is completely determined by the values T(ui). This means linear transformations are highly structured; their behavior isn't arbitrary but fixed by their action on a relatively small set.
  2. Freedom to Define on a Basis (Theorem 5.1.6): We can define a linear transformation T:VW simply by choosing where each basis vector of V should map to in W. For any choice of target vectors w1,,wnW, there exists a unique linear transformation T such that T(ui)=wi for a basis (u1,,un) of V.
    • Rationale: This theorem guarantees existence and flexibility. It tells us that bases provide complete freedom to construct linear transformations with desired properties. We don't need a "formula" for T; specifying its action on a basis is enough to uniquely determine a valid linear T.
  3. Matrix Representation [T]αβ (Definition 5.1.7): Since T is determined by T(u1),,T(un) (where α=(ui) is a basis for V), and each T(ui) can be uniquely represented by its coordinates relative to a basis β of W, we can encode the entire transformation T as a matrix whose columns are these coordinate vectors [T(ui)]β.
    • Rationale: This provides a concrete way to represent potentially abstract linear transformations using a grid of numbers (a matrix). It bridges the abstract theory of linear transformations with the computational tools of matrix algebra. The choice of bases α and β acts like choosing coordinate systems or "languages" to describe the transformation.
  4. Matrix Operations Mirror Transformation Operations (Prop 5.1.15, 5.1.18): The definitions for adding matrices, multiplying a matrix by a scalar, and multiplying matrices are specifically chosen so that they correspond precisely to adding linear transformations, scaling linear transformations, and composing linear transformations, respectively, when viewed through their matrix representations relative to fixed bases.
    • Rationale: This ensures that matrix algebra is a faithful computational model for the algebra of linear transformations. We can perform manipulations on matrices (which are often easier to compute with) and know that the results accurately reflect operations on the underlying functions. The complex-looking definition of matrix multiplication, for example, is exactly what's needed to make [TS]=[T][S] work out.

In summary, Section 5.1 generalizes the idea of "structure-preserving maps" between vector spaces. The rationale is to define transformations based on the core vector space operations (addition and scalar multiplication), allowing the theory to apply broadly. Key results establish that these transformations are determined by their action on a basis and can be represented concretely by matrices once bases are chosen, with matrix operations directly mirroring the operations on the transformations themselves.

Proposition 5.1.2

Let m,nN+ and suppose that we have numbers ai,jR for all i,jN with both 1im and 1jn. Define T:RnRm by

T((x1x2xn))=(a1,1x1+a1,2x2++a1,nxna2,1x1+a2,2x2++a2,nxnam,1x1+am,2x2++am,nxn)

We then have that T is a linear transformation.

Proposition 5.1.3

Let V be a vector space and let α be a basis of V with n elements. The function Coordα:VRn is a linear transformation.

Proposition 5.1.4

Let T:VW be a linear transformation. We have the following:

  1. T(0V)=0W (where 0V is the zero vector of V, and 0W is the zero vector of W)
  2. T(v)=T(v) for all vV
  3. T(c1v1+c2v2)=c1T(v1)+c2T(v2) for all v1,v2V and all c1,c2R
Theorem 5.1.5

Let V and W be vector spaces, and suppose that Span(u1,u2,,un)=V. Suppose that T:VW and S:VW are linear transformations with the property that T(ui)=S(ui) for all i{1,2,,n}. We then have that T=S, i.e. T(v)=S(v) for all vV.

Theorem 5.1.6

Let V and W be vector spaces, and suppose that (u1,u2,,un) is a basis of V. Let w1,w2,,wnW. There exists a unique linear transformation T:VW with T(ui)=wi for all i{1,2,,n}.

Proposition 5.1.9

Let T:VW be a linear transformation, let α=(u1,u2,,un) be a basis for V, and let β=(w1,w2,,wm) be a basis for W. Suppose that [T]αβ=(a1,1a1,2a1,na2,1a2,2a2,nam,1am,2am,n) and that [v]α=(c1c2cn). We then have that [T(v)]β=c1[T(u1)]β+c2[T(u2)]β++cn[T(un)]β, and hence [T(v)]β=(a1,1c1+a1,2c2++a1,ncna2,1c1+a2,2c2++a2,ncnam,1c1+am,2c2++am,ncn).

Proposition 5.1.12

Let V and W be vector spaces, and let T,S:VW be linear transformations.

  1. The function T+S is a linear transformation
  2. For all rR, then function rT is a linear transformation
Proposition 5.1.15

Let V and W be vector spaces, let T:VW and S:VW be linear transformations, let α=(u1,u2,,un) be a basis for V, and let β=(w1,w2,,wm) be a basis for W. We have the following:

  1. [T+S]αβ=[T]αβ+[S]αβ
  2. [cT]αβ=c[T]αβ for all cR
Proposition 5.1.16

Let V,Z, and W be vector spaces, and let T:ZW and S:VZ be linear transformations. We then have that TS:VW is a linear transformation.

Proposition 5.1.18

Let V,Z, and W be finite-dimensional vector spaces, and let T:ZW and S:VZ be linear transformations. Let α be a basis for V, let γ be a basis for Z, and let β be a basis for W. We then have [TS]αβ=[T]γβ[S]αγ.

Section 5.2: The Range and Null Space of a Linear Transformation

Core Idea: Understanding the Input-Output Behavior of Linear Transformations

Given a linear transformation T:VW, this section introduces two fundamental subspaces that help us understand what T "does" to the vectors in V:

  1. Range (Where do the outputs land?): What vectors in the codomain W are actually "hit" by the transformation T?
  2. Null Space (What inputs get "lost"?): Which vectors in the domain V get mapped to the zero vector 0W in the codomain?

Rationale for Range(T):

Rationale for Null Space (Kernel) Null(T) (Definition 5.2.1):

Rationale for Rank, Nullity, and the Rank-Nullity Theorem:

Rationale for Connecting Null Space/Range to Injectivity/Surjectivity/Solutions:

In summary, Section 5.2 defines and analyzes the range and null space because they are fundamental subspaces that reveal key aspects of a linear transformation's behavior – its outputs, what it collapses, its injectivity, and its surjectivity. The Rank-Nullity Theorem provides a crucial link between the dimensions of these spaces and the domain. These concepts are also essential for understanding the structure of solutions to linear systems.

Proposition 5.2.2

Let V and W be vector spaces, and let T:VW be a linear transformation.

  1. Null(T) is a subspace of V
  2. range(T) is a subspace of W
Proposition 5.2.3

Let T:RnRm be a linear transformation, and let bRm. The following are equivalent:

  1. brange(T)
  2. b is a linear combination of the columns of [T]
    Thus, range(T) is the span of the columns of [T]
Corollary 5.2.4

Let T:RnRm be a linear transformation, and let B be an echelon form of the matrix [T]. The following are equivalent:

  1. T is surjective
  2. Every row of B has a leading entry
Corollary 5.2.5

Let T:RnRm be a linear transformation and let u1,u2,,unRm be the columns of [T]. Let B be an echelon form of the matrix [T]. If we build the sequence consisting only of those ui such that the ith column of B has a leading entry, then we obtain a basis for range(T).

Proposition 5.2.6

Let V and W be vector spaces, and let T:VW be a linear transformation.

  1. If T(v)=b and zNull(T), then T(v+z)=b
  2. If T(v1)=b and T(v2)=b, then v1v2Null(T)
Corollary 5.2.7

Let T:VW be a linear transformation, and let bW. Suppose that vV is such that T(v)=b. We have {xV:T(x)=b}={v+z:zNull(T)}.

Corollary 5.2.9

Let T:RnRm be a linear transformation, and let B be an echelon form of [T]. We then have that rank(T) is the number of leading entries in B.

Theorem 5.2.10 (Rank-Nullity Theorem)

Let T:VW be a linear transformation with V and W finite-dimensional vector spaces. We then have that rank(T)+nullity(T)=dim(V).

Proposition 5.2.11

Let T:VW be a linear transformation. We have that T is injective if and only if Null(T)={0V}.

Proposition 5.2.12

Let V and W be vector spaces. Let T:VW be an injective linear transformation and let (u1,u2,,un) be a linearly independent sequence in V. We then have that (T(u1),T(u2),,T(un)) is a linearly independent sequence in W.

Proposition 5.2.13

Let V and W be vector spaces. Let T:VW be a surjective linear transformation and assume that Span(u1,u2,,un)=V. We then have that Span(T(u1),T(u2),,T(un))=W.

Corollary 5.2.14

Let V and W be finite-dimensional vector spaces, and let n=dim(V) and m=dim(W). Let T:VW be a linear transformation.

  1. If T is injective, then nm
  2. If T is surjective, then mn
  3. If T is bijective, then m=n
Proposition 5.2.15

Suppose that T:VW is a bijective linear transformation. We then have that the function T1:WV is a linear transformation.

Proposition 5.2.17

Let A be an n×n matrix, and let B be an echelon form of A. We then have that A is invertible if and only if every row and every column of B has a leading entry.

Section 5.3: Determinants

Core Idea: Generalizing Signed Area/Volume and Linking it to Matrix Properties

Remember back in Section 3.4 we defined the determinant for 2×2 matrices, det(abcd)=adbc, and interpreted it as the signed area of the parallelogram formed by the column (or row) vectors. Section 5.3 aims to generalize this concept to n×n matrices. The key goals are:

  1. Geometric Intuition: To define a number associated with n vectors in Rn (or an n×n matrix) that represents the signed n-dimensional volume of the parallelepiped they form. The sign should indicate orientation (like right-hand vs. left-hand rule in R3).
  2. Algebraic Properties: To find a function that behaves predictably and has useful algebraic properties, especially concerning matrix operations and invertibility.

The Rationale - Defining the Determinant Axiomatically (Definition 5.3.1):

The Rationale - Connecting Determinants to Row Operations and Properties:

Defining the determinant of a matrix A as the determinant function applied to its rows (Definition 5.3.6) allows us to understand how elementary row operations affect the determinant, based on the axioms:

  1. Swapping Rows (Prop 5.3.3): Swapping two rows multiplies the determinant by 1.
    • Rationale: Follows algebraically from the axioms. Geometrically corresponds to changing the orientation.
  2. Scaling a Row (Axiom 3): Multiplying a row by c multiplies the determinant by c.
    • Rationale: Directly from the scaling property of the determinant function.
  3. Adding a Multiple of One Row to Another (Prop 5.3.4): This operation does not change the determinant.
    • Rationale: Follows algebraically from linearity and the degeneracy property (f(,vi,,cvi,)=0). Geometrically, this corresponds to a "shear" transformation of the parallelepiped, which preserves volume.

The Rationale - Computational Methods and Key Theorems:

  1. Computation via Row Reduction: The properties above provide a practical method to compute determinants. Use row operations (mostly row combinations, which don't change the determinant, and swaps, which just flip the sign) to reduce the matrix to an upper triangular form. The determinant of a triangular matrix is just the product of the diagonal entries (Prop 5.3.10). Keep track of the sign changes from swaps.
    • Rationale: This leverages the efficient Gaussian elimination process and the simple determinant calculation for triangular matrices.
  2. Cofactor Expansion (Theorem 5.3.14): Provides a recursive formula to compute determinants.
    • Rationale: This formula arises naturally when fully expanding the determinant definition using multilinearity. It connects the determinant of an n×n matrix to determinants of smaller (n1)×(n1) matrices. While often slower than row reduction for large matrices, it's important theoretically and useful for smaller cases (like 3×3).
  3. Determinant and Invertibility (Corollary 5.3.11): A is invertible det(A)0.
    • Rationale: This is arguably the most important property algebraically. If det(A)=0, it means the rows are linearly dependent (Prop 5.3.5, Prop 5.3.8), so the matrix can't be row reduced to the identity, hence it's not invertible. Conversely, if A is invertible, its RREF is I, and det(I)=10. Since row operations only multiply the determinant by non-zero numbers, det(A) must have been non-zero. Geometrically, invertibility requires the transformation not to collapse space into a lower dimension, meaning the volume factor (det(A)) must be non-zero.
  4. Determinant of a Product (Theorem 5.3.15): det(AB)=det(A)det(B).
    • Rationale: This connects determinants with matrix multiplication (and thus function composition). The volume scaling factor of a composition of transformations is the product of the individual scaling factors.

In essence, Section 5.3 defines the determinant as a function capturing signed volume and satisfying key linearity properties. This function provides a powerful tool linking the geometry of transformations (volume scaling, orientation) with the algebra of matrices (invertibility, row operations) and provides methods for computation.

Theorem 5.3.2

For each nN+, there is a unique function f satisfying the determinant properties. We call this function det.

Proposition 5.3.3

Let v1,v2,,vnRn. If i<j, then det(v1,,vi,,vj,,vn)=det(v1,,vj,,vi,,vn).

Proposition 5.3.4

Let v1,v2,,vnRn and let cR. If i<j and cR, then det(v1,,vi,,vj+cvi,,vn)=det(v1,,vi,,vj,,vn) and det(v1,,vi+cvj,,vj,,vn)=det(v1,,vi,,vj,,vn).

Proposition 5.3.5

Let v1,v2,,vnRn and assume that (v1,v2,,vn) is linearly dependent. We then have that det(v1,v2,,vn)=0.

Proposition 5.3.8

Suppose that A and B are n×n matrices with ARB. We then have that det(A)=0 if and only if det(B)=0.

Proposition 5.3.9

If A is an n×n diagonal matrix with A=(a1,1000a2,2000an,n), then det(A)=a1,1a2,2an,n.

Proposition 5.3.10

If A is an n×n upper triangular matrix, i.e. if A=(a1,1a1,2a1,n0a2,2a2,n00an,n), then det(A)=a1,1a2,2an,n.

Corollary 5.3.11

If A is a square matrix, then A is invertible if and only if det(A)0.

Theorem 5.3.12

For any n×n matrix A, we have det(A)=det(AT).

Theorem 5.3.14

Let A be an n×n matrix. For any i, we have det(A)=ai1Ci1+ai2Ci2++ainCin and for any j, we have det(A)=a1jC1j+a2jC2j++anjCnj.

Theorem 5.3.15

If A and B are n×n matrices, then det(AB)=det(A)det(B).

Section 5.4: Eigenvalues and Eigenvectors

Core Idea: Finding the "Natural Axes" of a Linear Transformation

When a linear transformation T maps a vector space V to itself (i.e., T:VV), we can ask if there are any special vectors whose direction is unchanged (or simply reversed) by the transformation. That is, applying T to such a vector only scales it, it doesn't rotate it off its original line through the origin. These special vectors and their corresponding scaling factors are the eigenvectors and eigenvalues.

Rationale for Eigenvalues and Eigenvectors (Definition 5.4.1 / 5.4.3):

  1. Geometric Motivation: Imagine applying a linear transformation T:R2R2. Most vectors will be moved and rotated to point in a different direction from where they started. However, some special vectors v might just get stretched or shrunk, so T(v) is parallel to v. This means T(v)=λv for some scalar λ. These vectors v (which must be non-zero by definition) point along the "natural axes" or fundamental directions of the transformation T. The scalar λ tells us the scaling factor along that direction.
  2. Why T:VV? The equation T(v)=λv only makes sense if the input v and the output T(v) live in the same vector space V.
  3. Simplifying Transformations: If we can find a basis consisting entirely of eigenvectors, then the action of T becomes very simple when described relative to that basis – it's just scaling along the basis directions. This was the motivation hinted at in Section 3.2 when a change of basis made the matrix diagonal.

Rationale for Eigenspace (Proposition 5.4.2):

Rationale for the Computational Approach (Matrices, Null Spaces, Determinants):

How do we actually find these special vectors and scalars for T:RnRn (represented by matrix A=[T])?

  1. Connecting to Null Space (Prop 5.4.4): The core algebraic trick is rewriting the eigenvalue equation:
    Av=λvAvλv=0AvλIv=0(AλI)v=0.
    • Rationale: This converts the eigenvalue problem into finding non-zero vectors in the null space of a different matrix, AλI. We already know how to find null spaces using Gaussian elimination.
  2. Finding Eigenvalues (Corollary 5.4.5 & Definition 5.4.6):
    • A scalar λ is an eigenvalue there exists a non-zero v such that (AλI)v=0.
    • This means λ is an eigenvalue Null(AλI){0}.
    • For a square matrix M, Null(M){0}M is not invertible det(M)=0.
    • Therefore, λ is an eigenvalue det(AλI)=0.
    • Rationale: This gives us a computational method! The expression det(AλI) is a polynomial in λ (the characteristic polynomial). Its roots are precisely the eigenvalues of A. Finding eigenvalues is reduced to finding roots of a polynomial.
  3. Finding Eigenvectors/Eigenspaces: Once an eigenvalue λ is found (by solving det(AλI)=0), the corresponding eigenvectors are simply the non-zero vectors in Null(AλI). The eigenspace is the entire null space Null(AλI).
    • Rationale: This links back to the null space calculation method from Section 4.2 and 5.2 (solving the homogeneous system (AλI)x=0).

Rationale for Diagonalization (Definition 5.4.9 & Corollary 5.4.11):

In summary, Section 5.4 introduces eigenvalues and eigenvectors as the scaling factors and invariant directions that simplify the understanding of a linear transformation T:VV. The rationale is to leverage these to understand T's geometry and to simplify computations. The section provides an algebraic pathway (via the characteristic polynomial derived from determinants and the connection to null spaces) to find these eigenvalues and eigenvectors. Diagonalization is presented as the ideal outcome where the transformation becomes simple scaling relative to an eigenbasis.

Proposition 5.4.2

Let T:VV be a linear transformation and let λR. The set W={vV:T(v)=λv}, which is the set of all eigenvectors of T corresponding to λ together with 0, is a subspace of V.

Proposition 5.4.4

Let A be an n×n matrix, let vRn, and let λR. We have that Av=λv if and only if vNull(AλI). Therefore, v is an eigenvector of A corresponding to λ if and only if v0 and vNull(AλI).

Corollary 5.4.5

Let A be an n×n matrix and let λR. We have that λ is an eigenvalue of A if and only if Null(AλI){0}.

Proposition 5.4.7

If A is an n×n matrix, then the characteristic polynomial of A is a polynomial of degree n.

Proposition 5.4.10

Let V be a finite-dimensional vector space, let T:VV be a linear transformation and let α=(u1,u2,,un) be a basis of V. The following are equivalent:

  1. [T]αα is a diagonal matrix
  2. u1,u2,,un are all eigenvectors of T
    Furthermore, in this case, the diagonal entries of [T]αα are the eigenvalues corresponding to u1,u2,,un.
Corollary 5.4.11

Let V be a finite-dimensional vector space and let T:VV be a linear transformation. We then have that T is diagonalizable if and only if there exists a basis of V consisting entirely of eigenvectors of T.